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(c) The confidence level of the fitted model as the 
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(b) The selected data points for the second center. FIG. 208 
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BACKGROUND 


The problem of extracting nonlinear relationships in large 
high-dimensional scattered data sets is of central importance 
across fields of science, engineering and mathematics. In 
particular, diverse areas such as machine learning, optimal 
control, mathematical modeling of physical systems often 
rely significantly on the ability to construct relationships from 
data. Subsequently there have been a multitude of applica- 
tions including financial time-series analysis, voice recogni- 
tion, failure prediction and artificial intelligence all of which 
provide evidence of the importance for nonlinear function 
approximation algorithms. 

The beginnings empirical data fitting may be traced to 
Gauss's work on using least squares to construct linear mod- 
els. Over the last two decades we have seen a tremendous 
growth in this area motivated by new ideas for computing 
nonlinear models. Numerous references of prior art articles 
are cited herein. In most cases, when a reference is referred to 
herein, it is cited by its number (in square brackets) from the 
References section hereinbelow. Thus, e.g., for computing 
nonlinear models, the following references [11, 45, 40, 37, 
38] in the References Section disclose exemplary prior art 
techniques. 

Diverse areas such as machine learning, optimal control, 
mathematical modeling of physical systems often rely sig- 
nificantly on the ability to construct relationships from data 
such as provided by constructing robust approximation mod- 
els. Moreover, there have been a multitude of applications 
including financial time-series analysis, voice recognition, 
failure prediction and artificial intelligence all of which pro- 
vide evidence of the importance for nonlinear function 
approximation algorithms. Our interest in this problem 
relates to representing data on a manifold as the graph of a 
function [8, 9] and the reduction of dynamical systems (see, 
e.g. [10]). 

A common element in empirical data fitting applications is 
that the complexity ofthe required model including the num- 
ber and scale of representation functions is not known a priori 
and must be determined as efficiently as possible. A variety of 
approaches have been proposed to determine the number of 
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model functions, i.e., the model order problem. A generally 
accepted measure of quality of such data fitting algorithms is 
that the resulting models generalize well to testing data, i.e., 
data associated with the same process but that was not used to 
construct the model. This requirement is essentially that the 
data not be overfit by a model with too many parameters or 
underfit by a model with too few parameters. 


One general approach to this problem is known as regular- 
ization, i.e., fitting a smooth function through the data set 
using a modified optimization problem that penalizes varia- 
tion. A standard technique for enforcing regularization con- 
straints is via cross-validation [18, 44]. Such methods involve 
partitioning the data into subsets of training, validation and 
testing data; for details see, e.g., [19]. 

Additionally, a variety of model growing and pruning algo- 
rithms have been suggested, e.g., 

(1) the upstart algorithm by M. Frean, *A Upstart Algorithm: 
A Method for Constructing and Training Feedforward 
Neural Networks," Neural Computation, 2. (2):198-209, 
1990; 

(2) the cascade correlation algorithm by S. E. Fahlman and C. 
Lebiere, “The cascade-correlation learning architecture,” 
In D. S. Touretzky, editor, Proceedings of the Connection- 


ist Models Summer School, volume 2, pages 524-432, San 
Mateo, Calif., 1988; 


(3) the optimal brain damage algorithm by Y. Le Cun, J. S. 
Denker, and S. A. Solla, *Optimal brain damage", In D. S. 
Touretzky, editor, 4dvances in Neural Information Pro- 
cessing Systems, volume 2, pages 598-605. Morgan Kauf- 
mann, San Mateo, Calif., 1990; and 


(4) the resource allocating network (RAN) proposed by Platt 
in *A Resource Allocating Network for Function Interpo- 
lation?’ Neural Computation, 3:213-225, 1991. 


Statistical methods have also been proposed that include, 
e.g., Akaike information criteria (AIC), Bayesian information 
criteria (BIC) and minimum description length (MDL) [42], 
[2] and Bayesian model comparison [29]. In [39], [31] and 
[20] the issue of selecting the number of basis functions with 
growing and pruning algorithms from a Bayesian prospective 
have been studied. In [5], a hierarchical full Bayesian model 
for RBFs is proposed. The maximum marginal likelihood of 
the data has also been used to determine RBF parameters 
[34]. For a more complete list of references the reader, is 
referred to [21]. 


In general, model order determination via both regulariza- 
tion and growing and pruning algorithms can be computa- 
tionally intensive and data hungry. More importantly, how- 
ever, is that these algorithms do not explicitly exploit the 
geometric and statistical structure ofthe residuals (see Terms 
and Descriptions section hereinbelow) during the training 
procedure. In addition, many algorithms in the literature 
require that anywhere from a few to a dozen ad hoc param- 
eters be tuned for each data set under consideration. 


Accordingly, it is desirable to alleviate the modeling diffi- 
culties in the prior art, and in particular, at least provide model 
generating methods and systems that are computationally less 
intensive, and that reduce (preferably to zero) the number of 
model generation parameter required for generating a model 
of appropriate accuracy. 
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TERMS AND DESCRIPTIONS 


Radial Basis Functions. Radial Basis Functions (RBFs) were 
introduced for the function approximation problem as an 
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alternative to multilayer perceptrons [11]. Part oftheir appeal 
is the variety of efficient algorithms available for their con- 
struction. In the extreme, the basis functions may be selected 
randomly (following the distribution of the data) with fixed 
scales. In this instance the resulting optimization problem is 
simply an over-determined least squares problem to deter- 
mine the expansion coefficients. One may improve on this 
approach at modest expense by employing a clustering algo- 
rithm to determine the basis function centers [32]. Further- 
more, RBFs may be adapted with rank one updates or down- 
dates [33, 35]. Over the years RBFs have been used 
successfully to solve a wide-range of function approximation 
and pattern classification problems [6, 21]. More recently, 
RBFs have been proposed as a tool for the simulation of 
partial differential equations, see, e.g., [1]. 

An RBF expansion is a linear summation of special non- 
linear basis functions (although there seems to be some ambi- 
guity in the literature in that both the expansion and the 
expansion functions may be referred to as RBFs). In general, 
an RBF is a 


K (2.1) 
fa) = Ax + ag + 3 a G(llx— cull), 


k=1 


where x is an input pattern, q, is the kth radial basis function 
centered at location c,, and a, denotes the weight forkth RBF 
and A is an mxn matrix. The term W denotes the parameters 
in the weighted inner product 

Ix i TT 
mapping f: R"—R". 

Note that the metric ||*|| used in the above equations may be 
the standard Euclidean norm or an Lp norm where 
1<=p<infinity or an l-infinity norm, and in each case, the 
weighting matrix W is selected from nonsingular matrices to 
optimize a data fitting procedure using nonlinear program- 
ming techniques well known in the art; e.g., conjugate gradi- 
ent, Newton's method, and other quasi-Newton methods. The 
term Ax+a, affords an affine transformation of the data and is 
useful so that the nonlinear terms are not attempting to fit flat 
regions. More general polynomials may be used for this pur- 
pose [41]. As usual, the dimensions of the input n and output 
m are specified by the dimensions of the input-output pairs. 
Residual. The difference between an actual value, and a value 
determined by a model approximating the actual value. 

IID. An abbreviation of “independent and identically distrib- 
uted”. That is, in the present disclosure the term “JID” is 
directed to the residuals obtained from an approximation of a 
particular data set being modeled. In particular, the approxi- 
mation method and system disclosed hereinbelow determines 
whether such residuals are IID. 

Domain Neighborhood (Spatio-temporal window): An extent 
consisting of a cluster of data in a local region of the input data 
(i.e., in the domain space of the approximation model (to be) 
generated). E.g., if a process is evolving both in space and 
time then a domain neighborhood is referred to as a spatio- 
temporal window, or alternatively as a space-time ball. Such 
a spatio-temporal window is distinguished from a temporal 
window in that points in spatio-temporal window may have 
been generated at any time whereas a temporal window con- 
tains only data that was produced in the time interval that 
defines the window. 


SUMMARY 


A novel approximation method and system for performing 
nonlinear function approximation or modeling for scattered 
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data is disclosed. In particular, such modeling is performed 
without requiring the use of ad hoc parameters. The approxi- 
mation method and system is based on detecting (any) struc- 
ture in residuals between the data and its approximation. 
Accordingly, structure or information content in the residuals 
is quantified via a statistical hypothesis test to determine 
whether the residuals are IID (as described in the Terms and 
Description section hereinabove). In particular, an hypothesis 
test is iteratively applied to the residuals for determining 
whether a new approximation function (e.g., a basis function, 
and more particularly, a radial basis function as described in 
the Terms and Description section above) should be added, 
and if so, where should the support for the new approximation 
be located relative to samples of the data set. When a deter- 
mination is made that this test has been passed using the 95% 
confidence criterion it may be inferred that there is no (geo- 
metric) structure left in the residuals and thus no additional 
approximation functions are added to the model. 

One aspect of the present approximation method and sys- 
tem is its application to high dimensional domains (e.g., 
domains having 2 or more dimensions) using a spatio-tem- 
poral ball rather than a temporal window (as in [3, 4]) for 
constructing local training sets from the data set being mod- 
eled. This innovation is particularly critical if the data set is 
periodic or quasi-periodic or, more generally, can be repre- 
sented on a manifold. 

Since the approximation method and system does not use 
ad hoc parameters that must be set per data set modeled, no 
parameters are varied nor tuned for particular data sets. Thus, 
diverse data sets can be modeled without making any changes 
to embodiments of the approximation method and system. 
This aspect greatly accelerates approximation of the data set 
being modeled. 

In at least one embodiment of the approximation method 
and system, radial basis functions are used as the approxima- 
tion functions. However, it is within the scope of the present 
disclosure that other type of approximation functions may be 
used in various embodiments of the approximation method 
and system, in particular, multi-layer perceptrons, and/or 
feed-forward neural networks. 

To illustrate the absence of ad hoc parameters, numerous 
example data sets are presented and approximated hereinbe- 
low by exactly the same programmatic embodiment of the 
novel approximation method and system disclosed herein. 
That is, no adjustments or parameter settings were made to 
the programmatic embodiment based on the data set being 
approximated. Hence, the present approximation method and 
system approaches a black-box methodology for nonlinear 
function approximation. This feature of the present approxi- 
mation method and system permits the advancement of a 
variety of other processes, e.g., the representation of data on 
manifolds as graphs of functions [8, 9], pattern classification 
[22. 26], as well as the low-dimensional modeling of dynami- 
cal systems [10]. 

For simplicity, it is assumed herein that a data set to be 
modeled by the present approximation method and system 
represents a functional relationship, or signal, with IID addi- 
tive noise. For fitting data without noise, such as data gener- 
ated to high precision by numerical simulations, it is useful to 
add IID noise to the signal before applying the algorithm. 
Also note, however, that if the functional relationship or sig- 
nal contains multiplicative noise we can take the natural loga- 
rithm of the functional relationship or signal to make the IID 
additive as one of skill in the art will understand. 

It is an aspect of the present approximation method and 
system to use a spatio-temporal window, i.e., space-time balls 
(as described in the Terms and Description section herein- 
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above), for determining the samples within the data set being 
modeled when determining what portion of the data set is to 
be fitted with a new approximation function. It is believed that 
the use of such a spatio-temporal window is critical for 
approximating a data set over high-dimensional domains, and 
in particular, for a data set generated by dynamical systems 
such as those arising in a physical apparatus such as a heat 
exchanger and a chemical processing plant, or those arising in 
financial markets such as the value of a stock or stock index. 
In addition such dynamical systems arise in scientific and 
physical models such as the Navier Stokes equations in fluid 
dynamics as one skilled in the art will understand. The appli- 
cants have observed that significantly more data samples are 
located in such space-time balls than in the temporal windows 
used in prior art approximation techniques, in particular see 
[3, 4]. Moreover, it has been observed that the use of such 
spatio-temporal windows results in the construction of sig- 
nificantly improved models. 

Additionally, we extend the prior art by proposing an algo- 
rithm where the model output lives in spaces of dimension 
more than one. The simplest approach is to use as many 
models as the dimension of the output space (one for each 
dimension). Alternatively, a more parsimonious representa- 
tion arises when we extend the IID test on the residuals to the 
multivariate case. 

The present approximation method and system may be 
used successfully with data sets from signals composed with 
additive IID noise. This is an extension over prior approxi- 
mation techniques based on statistical hypotheses that are 
substantially restricted to Gaussian noise, or that do not pro- 
vide information concerning where the basis functions should 
be placed. 

The present approximation method and system may be 
used to model batch or predetermined data sets, or to model 
streaming data sets that are determined, e.g., in (near) real- 
time. Moreover, the present approximation method and sys- 
tem does not require the streaming data to be repeatedly 
reviewed as some “on-line” approximation techniques 
require. Also, although the algorithm was presented here in 
the context of growing radial basis functions, in principle it 
can be employed with other architectures for fitting nonlinear 
functions. 

Additional features and benefits will become evident from 
the accompanying drawing and the description hereinbelow. 
Such additional feature and benefits are considered subject 
matter for seeking patent protection even though they may not 
be identified or discussed in this Summary section. In par- 
ticular, the invention herein is defined by the claims as sup- 
ported by the entire specification. 


BRIEF DESCRIPTION OF THE DRAWINGS 


The patent or application file contains at least one drawing 
executed in color. Copies of this patent or patent application 
publication with color drawings will be provided by the 
Office upon request and payment of the necessary fee. 

FIGS. 1A-1C shows a pseudo code version of a novel 
algorithm for generating an approximation model, MODEL, 
from a training data set using basis functions (and in particu- 
lar, radial basis functions) to generate MODEL. 

FIGS. 2A through 2C show the graphs ofactivation or basis 
functions that may be used to generate radial basis functions 
used in the pseudo code of FIGS. 1A-1C. Note, that the 
mollifier function of FIG. 2A and the quarter circle functions 
are believed to be particularly novel as activation functions. 

FIG. 3A shows a graph of the pringle data set that is known 
in the art. 
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FIG.3B shows a data set to be modeled that is based on the 
pringle graph, wherein noise has been introduced. 

FIGS. 4A through 4D show plots of the autocorrelation 
(ACC) functions for the four major radial basis functions that 
capture the underlying structure of the data set of FIG. 3B. 

FIGS. 5A through 5D show the location and shape of the 
graphs 504a through 504d ofthe four RBFs that are generated 
by the algorithm of FIGS. 1A-1C when modeling the data set 
of FIG. 3B before the IID stopping criteria described herein- 
above is satisfied. 

FIG. 6A shows the maximum values of the corresponding 
ACC,,* function for each step in the training process of FIGS. 
1A-1C (i.e., steps 112 through 212). 

FIG. 6B shows the performance of the approximation 
model for approximating the data set of FIG. 3B in the RMSE 
sense as the number of assigned RBFs increases. 

FIG. 6C shows the confidence level at each stage of train- 
ing for approximating the data set of FIG. 3B. 

FIG. 7 shows a plot of the output of the approximation 
model generated using the four RBFs described FIGS. 4A 
through 6C together with the target values of the testing set. 

FIG. 8 shows the pseudo code of an embodiment of an 
algorithm for generating an approximation model for multi- 
variate output, the pseudo code here is similar to FIGS. 
1A-1C, but with somewhat different notation. 

FIG. 9 shows the pseudo code of another embodiment of an 
algorithm for generating an approximation model for multi- 
variate output, the pseudo code here is also similar to FIGS. 
1A-1C, but with somewhat different notation. 

FIG. 10A shows the training and testing data sets for a time 
series from a single mode skew Gaussian with Gaussian IID 
noise with standard deviation of 0.01. 

FIG. 10B shows a graph of an approximation model using 
modulated Gaussian-Gaussian RBFs in the algorithm whose 
pseudo code is shown in FIGS. 1A-1C. 

FIG. 11A shows a graph of an approximation model using 
regular Gaussian RBFs in the algorithm whose pseudo code is 
shown in FIGS. 1A-1C. 

FIG. 11B shows a graph of the RMSE versus the number of 
regular Gaussian RBFs added when generating the approxi- 
mation model graphed in FIG. 11A. 

FIGS. 12A, 12B, and 12C, respectively show the behavior 
ofthe one-dimensional Gaussian-Gaussian, Cauchy-Cauchy, 
and Gaussian-Cauchy RBFs. 

FIGS. 13A and 13B show two dimensional Gaussian- 
Gaussian and Cosine-Cauchy RBFs and their associated con- 
tour plots. 

FIG. 14A shows the initial ACF(h) (computed on the train- 
ing data; i.e., step 140 of FIGS. 1A-1C) for the Mackey-Glass 
Time Series. 

FIG. 14B shows the final ACFs of the residuals (used in 
steps 144 through 188 of FIGS. 1A-1C) that indicates that the 
model fitting process should be terminated given 9596 confi- 
dence has been achieved when modeling the Mackey-Glass 
Time Series. 

FIG. 14C and FIG. 14D show the associated ACC func- 
tions, i.e., the point-wise values f, corresponding to the 
maximum value of the ACF in FIG. 14A and FIG. 14B, 
respectively when modeling the Mackey-Glass Time Series. 

FIG. 15A shows a plot of y(h*) as new basis functions are 
added to the approximation model being generated when 
modeling the Mackey-Glass Time Series. 

FIG. 15B shows a plot of the RMSE of the approximation 
model being generated as new basis functions are added to the 
model for modeling the Mackey-Glass Time Series. 


US 8,521,488 B2 


15 

FIG. 15C shows a plot of the confidence level of the 
approximation model being generated as new basis functions 
are added for modeling the Mackey-Glass Time Series. 

FIG. 16 shows the output of a 76 mode approximation 
model for the test set of the Mackey-Glass Time Series com- 
pared to the actual or target values of this test set. For this 
present approximation model, an RMSE value of 0.0116 was 
obtained, and the 95% level of confidence for the stopping 
criteria (of FIGS. 1A-1C) was satisfied. 

FIG. 17 shows a graph of a data set consisting of daily 
values of the Deutsche Mark/French Franc exchange rate 
over 701 days. 

FIG. 18 shows the output of the resulting model (1-step 
prediction values) and the target (market) values for the test 
data from the daily values ofthe Deutsche Mark/French Franc 
exchange rate data. 

FIG. 19A shows a plot of y(h*) as new basis functions are 
added to the approximation model being generated when 
modeling the daily values ofthe Deutsche Mark/French Franc 
exchange rate data. 

FIG. 19B shows a plot of the RMSE of the approximation 
model being generated as new basis functions are added to the 
model for modeling the daily values of the Deutsche Mark/ 
French Franc exchange rate data. 

FIG. 19C shows a plot of the confidence level of the 
approximation model being generated as new basis functions 
are added for modeling the daily values of the Deutsche 
Mark/French Franc exchange rate data. 

FIGS. 20A through 20C show the Deutsche Mark/French 
Franc exchange rate data for the first 3 main RBFs approxi- 
mation model generated. Note the need for the spatio-tempo- 
ral window as evidenced in particular by FIGS. 20B and 20C. 
FIG. 20B shows two distinct time regions contributing to the 
local data indicating either a periodic or quasi-periodic 
behavior. 

FIG. 21 shows an operative embodiment of a system for 
generating approximation models according to FIGS. 1A-1C 
through 20C hereinabove, and the description herein. 


DETAILED DESCRIPTION 


Radial Basis Functions. 

Referring to the description of radial basis functions (RBF) 
inthe Terms and Description section hereinabove, the general 
training problem for the RBF is the determination of the 
unknown parameters: (A, o, C} W, K}. One object of the 
present disclosure is to determine an optimal value for K, i.e., 
the model order. Of course optimizing K depends on high 
quality values for the remaining parameters and we propose 
algorithms for this purpose. 

The requirements of the RBF expansion are the same as 
standard data fitting problems, i.e., given a set of L input- 
output pairs 


(050 cix ou) and yy ns 


the goal is to find the underlying mapping f such that 
yr-fx,). In the standard situation, there is more data than 
equations so it is not possible to approximate each such map- 
ping f exactly. Thus the problem is to minimize the cost 
function 


1 L 
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where now the metric |||| is generally the Euclidean inner 
product and is distinct from the W-weighted inner product 
used to compute the contributions of the basis functions. 
One of the attractive features of RBFs is the variety of basis 
functions available for the expansion. In particular, these 
functions come in both local and global flavors and include 


UNA nl}. 


These functions satisfy the criteria of radial basis functions as 
described in [38] and the associated theorem states that if D is 
a compact subset of R”, then every continuous real valued 
function on D can be approximated uniformly by linear com- 
binations of radial basis functions with centers in D. Thus, 
given any data set of D, such as X above, a function that 
approximates X can be constructed as a linear combination of 
radial basis functions with the center of each such radial basis 
function being in D as one of skill in the art will understand. 

For simplicity, the present description is restricted to (lo- 
cal) Gaussian radial basis functions, i.e., $(r)-exp(-r?) but 
the scope of the present disclosure is not limited to such 
specific functions. Indeed, the method disclosed herein is 
applicable to any admissible RBF; i.e. the interpolation 
matrix is nonsingular in the square case. Moreover, disclosed 
hereinbelow are several candidates for compact radial basis 
functions that are believed to have excellent conditioning 
properties associated with the over determined least squares 
problem. 

It is conventional in function approximation problems to 
distinguish between: (a) situations where: the data for which 
a model is desired is available at the outset and (b) situations 
where the data becomes available as the model is being built. 
In keeping with standard terminology, situations as in (a) 
above are referred to as “batch” training problems, and situ- 
ations as in (b) above are referred to “on-line training prob- 
lems. Although, the approximation method and system dis- 
closed herein is described for batch training, a similar method 
may be also used to generate an approximation model from 
on-line training. Accordingly, determining approximation 
models for on-line training problems is within the scope of the 
present disclosure as one of ordinary skill in the art will 
appreciate upon review of the novel approximation tech- 
niques described herein. 

Testing for Structure in Model Residuals. 

As indicated earlier, essential information can be inferred 
about the appropriateness of an approximation model by 
examining its residuals with the data set from which the 
model was derived. A premise for the present approximation 
method and system is that if there is (non-random) structure 
remaining in the residuals, then one or more additional basis 
functions should be added to the model to capture the struc- 
ture residing in the residuals. On the other hand, if there is no 
discernible structure in such residuals, then it is presumed that 
there is substantially no information content in the residuals 
to be models, and according such lack of structure constitutes 
a stopping criterion for the development of the corresponding 
model. 

Thus, the present approximation method and system is 
useful for modeling data that may be viewed as the superpo- 
sition of, e.g., a signal together with IID noise. However, if 
perchance, the data is noise free then IID noise may be added 
to the data so that the present approximation method and 
system may be employed thereon. In view of this, when a 
model is derived according to the present approximation 
method and system, it is expected that the model residuals 
will be IID while the model itself should represent substan- 
tially all of the geometric structure residing in the data. 
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Hence, an indication that an RBF model, derived from an 
embodiment of the present approximation method and sys- 
tem, is unsatisfactory is the failure ofthe residuals to satisfy a 
hypothesis test for being entirely or substantially IID noise. 
This observation forms the basic idea for the stopping crite- 
rion used herein. The primary advantage of such a criterion is 
that the hypothesis test for the criterion does not involve any 
ad hoc parameters that require adjustment. 

The IID test for determining structure in the residuals and 
its associated algorithm are described. 

Statistical Background for Testing for IID Noise. 

Assuming the terminology described  hereinabove, 
wherein f denotes the (current version of the) approximation 
model, residual for the nth data point or sample from which 
the model was derived is defined as 


€, Vr Na): 
Following [4], we denote the set of residuals for a model of 


order K, as 


RF-[e, n1 (3.1) 


where L is the cardinality of the training set. The standard 
definition for the sample autocorrelation function, p(h), 


(ACF) fora set of residuals e}, e3, e. ..., e; with sample mean 
e and lag h is defined as 
$0) (3.2) 
h) = —, 
e (0) 
where -L<h<L, and 
L-lhl (3.3) 


Reed l 
Hh) = L2. ath, e;). 


As proposed in [4, 3] we decompose the ACF into its com- 
ponent terms 


a(,e;)- (e; 7e Mee). (3.4) 


For a fixed lag h (where -L+1=h=0, h an integer), the 
quantity a(h, e,) is the contribution of the i” residual to the 
autocorrelation function. Below, the quantity a is further 
described, and it is shown that a reveals critical information 
concerning where the support for new basis functions should 
be relative to the series of points in the data set being modeled. 
Given the importance of a, it is referred to herein as the 
autocorrelation contribution. Accordingly, there is a function 
ACC, (or simply ACC if there is no confusion in the value of 
h) which may written as ACC(i)=a(h, ej), i71, . . . , L-h, for 
fixed h. 

IID Hypothesis Test. 

As indicated above, the addition of new basis functions is 
terminated when the residuals appear to have no further struc- 
ture. As a test for structure in the residuals, a determination is 
made as to whether the sequence of residuals is IID (i.e., 
independent and identically distributed). The relevant theo- 
rem from statistics states that for large L, the sequence of 
sample autocorrelations of an ID sequence U,, Us, ..., Uz 
with finite variance is approximately IID with normal distri- 
bution with mean zero and variance 1/L i.e., N(0, 1/L), as 
referred to in the literature [7]. Hence, ife,,e,,...,¢, is a 
realization of such an IID sequence, then the sequence of 
autocorrelations p(h), for 0</hISL-1 is approximately IID, 
as one skilled in the art will understand. Accordingly, this 
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implies that at least 95% of the sample autocorrelations 
should be bounded as follows: 


—1.96 


VL 


1.96 
< ph) < Eee. 


VL 


(3.5) 


Note that (3.5) above is also a standard statistical result; since 
when L is relatively large, the value 1.96 corresponds to the 
0.95 quantile of the normal distribution. Therefore, if one 
computes the sample autocorrelations for values ofh wherein 
O«IhIL-1, and finds that more than 0.05Ihl of the samples 
fall outside of the bounds of (3.5), then the IID hypothesis is 
rejected. Note, however, in practice such sample autocorre- 
lations do not need be computed for Ihl up to L-1, since ifone 
finds that for some hy more than 0.05Ih,! of the samples fall 
outside of the bounds of (3.5), the IID hypothesis is rejected. 
Additionally, if one sample autocorrelations fall far outside 
the bounds of (3.5), then the IID hypothesis is rejected, as is 
conventional in the statistical arts. 

Note that if the underlying model is known, i.e., the data to 
be modeled is known, there is a more accurate bound for IID 
hypothesis test, described in [7]. 

The above test of (3.5) can equivalently be written in terms 
of y? distribution. Given 


it has been shown in [7] that Q has a x? distribution with L-1 
degrees of freedom. The adequacy of the current (version of 
the) approximation model is therefore rejected at the level a, 
wherein o. may be the confidence of the x? test as one of skill 
in the art will understand. 

Additional Testing Possibilities. 

To show rigorously that a sequence of random variables is 
truly IID, higher moments (if they exist) also need to be 
considered, wherein such each higher moment n (n-3, 
4,...) is the expectation of the residuals of the model raised 
to the n" power. Thus, a 3”? order moment is the expectation 
ofthe cube ofthe random variables. Note that if a sequence of 
random variables is IID, then any function of the random 
variables is also IID. Thus, the autocorrelation function 
(ACF) is not only the sequence of residuals that may be used. 
In addition, e.g., squares, cubes and the absolute values of 
such residuals must also satisfy the termination test of (3.5) or 
its equivalent. Although such alternative/additional functions 
for higher moments are within the scope of the present dis- 
closure, for simplicity the present disclosure primarily 
describes using the ACF of the residuals for determining IID 
and thereby terminating the generation of additional basis 
functions. 

Note that it is believed that in many cases, the autocorre- 
lation tests of (3.5) above and its equivalent may be suffi- 
ciently effective as a termination condition. Moreover, the 
test of (3.5) and its equivalent do indeed provide a necessary 
and sufficient condition for a sequence to be white noise, i.e., 
the residuals have no discernible pattern indicative of infor- 
mation content. 

Lastly, it is noted that there are alternatives to the test 
described above based on the autocorrelation function for 
testing ND or white noise. Other such tests include the dif- 
ference-sign test, the rank test, and a test based on turning 
point, as described in [7]. These tests might also be applied as 
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stopping criteria individually, or in conjunction with the cur- 
rent test based on the autocorrelation function. In another 
embodiment, such a termination test may be formulated by 
considering the relationship between the data inputs and the 
residuals ofthe outputs, as described in [28]. For example, the 
covariance, g(h), between the residuals and input data, i.e., 


g(h) = (1/L)« 


i 


L 

ei * X(-4) 
=1 
where e, is the i^ residual and ¥,,.,,) is the (i-h)" input value. 
RBF Algorithm using Spatio-Temporal (Space-Time) Ball. 

In this section the details for generating an approximation 
model from batch data is described with reference to FIGS. 
1A-1C. In particular, the question of whether a new basis 
function should be added is answered by the IID test of (3.5) 
above (or its equivalent). Note that this test also indicates 
where the new basis function should be added. The concept of 
a space-time ball for defining local regions within the batch 
data is also described. 

Initially, it is assumed that the residuals of the data are 
equal to the original data (i.e., the x,’s hereinabove), and/or 
the original data with noise purposefully added thereto. 

The variable ran_flag is a flag indicating whether to con- 
tinue with the derivation of the approximation model, i.e., if 
ran_flag is 1, the derivation continues, and if ran_flag is 0, 
then the derivation terminates. Accordingly, in step 104, ran_ 
flag is set to 1. The variable K identifies the current order of 
the model. Accordingly, in step 108, K is set to 0. In step 110, 
the identifier, MODEL, that provides access to the data struc- 
ture for the approximation function being derived is initial- 
ized to NULL. Note that the data structure accessed by 
MODEL may be, e.g., a list of pointers, wherein each pointer 
may be used to access a particular approximation function. In 
step 112, a loop commences, the loop ending at step 216. 
Each iteration of this loop inserts an additional approximation 
function (e.g., a RBF) into the approximation model for 
MODEL. Accordingly, this loop continues for as long as 
ran_flag is set to 1. In the first step of the loop (step 116), a 
determination is made as to whether the approximation model 
being derived has no basis functions therein; i.e., MODEL is 
equal to NULL. If so, then in step 120 the values for the 
residuals, fe, 1^, .,, for this currently empty model are set to 
the training data set [f(x,)]7, , used for generating the 
approximation model; i.e., each residual e, of fe, 1^, , is 
assigned the value f(x,). 

Alternatively, if there is already at least one basis function 
in the model identified by MODEL, then the steps commenc- 
ing at step 132 is performed. 

Accordingly, in step 132, an inner loop (including steps 
136 and 140) commences for computing an autocorrelation 
function ACF. In particular, in step 136, for each lag (or shift) 
in the time domain h, 0<h=L-h, the following the autocor- 
relation term is computed for each residual e, where 1=i=L- 
Ihi: 


ah, een 766,76) 
wherein e is the mean of the residual samples e,. Note that 
various data structures for a may be utilized. For example, 
since both h and i are integers, a(h, ej) may be represented as 
an array or list ALPHA[h, i], -L+1=h=0, 1=i=L-thl, 
wherein each ALPHAJh, i] refers to a corresponding data 
structure including the ordered pair (e,, a(h, e,)). 
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Then in step 140, the value of the autovariance function 
ACF for h is computed as 


L-|h| 


ACF(h) = 22 ath, es). 
P 


Accordingly, steps 136 and 140 are performed until all values 
ofh, -L+1=h=0 are computed. 

Subsequently, in step 144, a determination is made as to 
whether at least 95% of the ACF(h)’s do not satisfy equation 
(3.5), or at least one of the values ACF(h) is substantially 
outside of the range identified in (3.5) hereinabove. If the 
sequence of ACF(h)'s is not determined to be IID, then steps 
148 through 184 are performed. Accordingly, in step 148 the 
value h* is determined as a lag h having a largest ACF value. 
Note, if there are two or more ACF(h) for this maximum, then 
one can be chosen at random. Accordingly, in step 152, a 
location for the center of a new basis function is determined. 
In one embodiment, this location is determined by first deter- 
mining the index i* of an autocorrelation contribution term 
that is largest, i.e., determine a largest o(h*, ej) for 1SiSL- 
|h*|. Then, the corresponding domain point x* is determined, 
i.e., X*<x,*. Step 156 is now performed, wherein the auto- 
correlation function ACC, *(1)-a(h*, ej), i=1,..., L-Ih*l is 
constructed. Note that since the array or list ALPHA[h, i] 
discussed above may be already computed, it is a simple 
matter to utilize the subarray/sublist ALPHA[h*, j], 1Sj=L- 
Ih! for the function ACC; *. Subsequently, in step 160 as an 
optional step, the function may be denoised by performing a 
denoising algorithm such as a wavelet denoising algorithm 
from one or more such algorithms well known to those skilled 
in the art. 

Now that the center x* of the new basis function has been 
found as described above, in step 164, a determination is 
made as to what data should be used to determine the scale 
and weight of the new basis function to be added to MODEL. 
For the function ACC, *(1)-a(h*, e,), the index i is inherited 
from the data labels of the input data (x,), and in the case ofa 
time-series, i corresponds to a time ordering. In practice, if 
ACC;* is plotted (as a function of i), the values of ACC,,*(i) 
decrease as i gets further away from 1* which is what is to be 
expected since 1* was selected to correspond with a local 
maximum in Equation (4.2). How quickly the values of 
ACC, *(1) decrease for both i>i* and i<i* is a property of the 
scale of the data and the approximation model (MODEL) 
being generated. 

For simplicity, it is assumed that ACC; *(1) decreases 
monotonically for both increasing and decreasing values of i 
until local minima are reached at the indices 1*<i* and r*>1*; 
wherein 1, r denote such left and right local minima, respec- 
tively. Accordingly, the following distances are computed in 
step 168 

d;-d(x;*,xi*) and d,=d(xjx,"), 


wherein these distances indicate the size of the data ball 
around the center x*. Accordingly, in step 172, the subset of 
the data employed to update the new basis function to be 
added is then X,,..,={xeX:|k-x*||E=d_}, 

where X is the entire training set, and the distance d, can be 
selected in a variety of ways. In one embodiment, d may be 
selected as 


d=max{d,d,}. 


However, alternative selections for d, may be where the ACC 
function either changes sign (where again we are moving 
away from the peak of the ACC function in both directions) or 
where the ACC function has a minimum where the value of 
the ACC function is small. 
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Note that X; p-a; may contain data whose indices have val- 
ues that are substantially different from i*, 1* and r*. For the 
original training data being time-series data, it is apparent that 
spatial neighbors within X;,.,, may not necessarily be tem- 
poral neighbors. Hence, this spatial-temporal windowing 
provided by X-a; has the potential to capture substantial 
training data that would not otherwise be captured. For peri- 
odic or quasi-periodic data, the inventors have found that the 
present space-time windowing is at least preferred if not 
essential. Note also that no smoothing of the ACC, *(1) was 
required to accurately determine X,,,.,; for the examples pro- 
vided herein. However, it may be possible that the ACC 
function can be very irregular. In such cases, the application 
of a smoothing algorithm to the ACC function can facilitate 
the computation of the zero crossings or local minima of the 
ACC function. 

Updating the Model. 

Still referring to the pseudo code of FIGS. 1A-1C, in step 
176, a new basis function to be added to MODEL is initial- 
ized, wherein preferably the new basis function is a radial 
basis function. However, other types of basis functions may 
be also added such as circle functions, bump functions, 
smooth piecewise defined functions with compact support 
where the non-zero segments could be cosines. Such other 
types of basis functions are within the scope of the present 
disclosure. For simplicity in the following description, it is 
assumed that the new basis function is radial basis function. 

Note that the expansion of the approximation model 
MODEL having K adapted previously added basis function 
orterms together with the one new basis function h(x; v) may 
be written as 


EDOS Oha), 


where 


h(x;v)-as(x-clw), 
with 

q being the radial basis function, e.g., any one of the radial 

basis functions described in the New Radial Basis Func- 
tions section hereinbelow; 

c being the center of this new radial basis function; 

a being the weight as described in the Terms and Descrip- 

tions section hereinabove; 

the parameter w is described in the Radial Basis Functions 

description provided in the Terms and Descriptions sec- 
tion hereinabove; and 

the parameter v=[c, o, a]” being the vector of parameters to 

be optimized, wherein o is a vector of width parameters 
that can be described as a special case of a weighted 
norm described below where W (as described herein- 
above) is a diagonal matrix. Note that o determines the 
parameter W as described further below. 

The new radial basis term h(x; v) is determined by an 
iterative optimization method using the error as a cost func- 
tion, wherein the data set X,,..,, is used to perform the opti- 
mization according to Step 180. In one instance the optimi- 
zation is unconstrained, but in another implementation it is 
constrained such that the error on the data not being used to fit 
the model parameters does not increase, or does not increase 
significantly. In our nonlinear optimization routine we use 
variations of steepest descent, the conjugate gradient method 
as well as other quasi-Newton methods and Newton's method 
in addition to alternating direction descent. Accordingly, h(x; 
V) 1s first initialized as 


hG;v)-aa(lx-colw) 
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where: 
the center c, of this new radial basis function is x* which is 
where much of the residual structure appears to reside; 
the weight o which initialized as described hereinbelow; 
and 
the parameter w is described in the Radial Basis Functions 
description provided in the Terms and Description sec- 
tion hereinabove. 
Note that the vector o can be initialized to the diagonal 
elements of the covariance matrix of the local data; i.e., o is 
initialized to 


Ooy diag(Cov(Xccat))- 


Note that W is then the diagonal matrix having the elements of 
© on its diagonal. Accordingly, the diagonal matrix W is the 
weighting term in the inner product and has its diagonal 
elements equal to the components in o. This initialization of 
the weights has proven to be extremely valuable in acceler- 
ating the convergence of the conjugate gradient iteration. The 
initial value for the weight, a, is calculated via least squares 
using the initial values for center location and widths. That is, 
we use a singular value decomposition of the interpolation 
matrix to compute the least squares solution, as one skilled in 
the art will understand. 

Given the above initializations, in step 180, the scale 
(width), weight and the center location, (i.e., these values 
collectively being the parameter v above), of the new basis 
function are optimized using the conjugate gradient method 
with cost function 


(4.3) 


Loi U^ 
Ely) = min ER lla; v) — yll5- 
X€X local 


over the values in the domain neighborhood X ,,...., where the 
metric ||*||,7 is standard Euclidean norm. Weighted Euclidean 
norms and Lp norms are also standard error measures in the 
present disclosure. Only the parameters associated with the 
new basis function are optimized; i.e., all other parameters are 
kept fixed. It another embodiment parameters associated with 
neighboring basis functions may be adapted simultaneously. 

Subsequently, in step 184, the optimized new basis func- 
tion h(x; v) is added to the approximation model MODEL, 
and in step 188, the order of MODEL is updated to indicate 
the current number of basis functions therein. 

After step 188, step 196 is performed wherein MODEL is 
evaluated over the entire training data set. Then in step 200, a 
new set of residuals is determined over the entire training set 
for the current version of MODEL. Subsequently, in step 204, 
runtime performance measurements may be also computed. 
In particular, a root mean square error (RMSE), as provided in 
(4.4) hereinbelow, may be computed together with the value, 
ACF(h*), of the autocorrelation function ACF. Then in step 
208, a confidence level or measurement is computed that is 
indicative of how closely MODEL approximates the training 
set. In one embodiment, such a confidence level may be 
determined using the hypothesis test for IID noise using the 
autocorrelation function, as described in the IID Hypothesis 
Test section hereinabove. Note that this test is only a neces- 
sary condition when applied to the residuals. In one embodi- 
ment, the confidence level may be determined by applying the 
autocorrelation test to functions of the residuals, e.g., the 
squares of the residuals. The confidence levels for the differ- 
ent functions ofresiduals may be computed simultaneously to 
produce a more robust test. For example, the IID test may be 
applied simultaneously to the residuals, the squares of the 
residuals and the cubes of the residuals and the confidence 
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level of each set of residuals may be required to be at the 95% 
confidence level. Alternatively, the different functions of 
residuals may be computed sequentially to reduce the 
expense of step 208. 

The confidence level is used in step 212 to determine 
whether the approximation model fits the training data close 
enough. That is, if the confidence level is above a predeter- 
mined threshold (e.g., 9596 or higher), then in step 216, the 
flag, ran flag is set to 0 which will cause the process to 
terminate when step 112 is subsequently evaluated. 

Examples are presented hereinbelow that illustrate the 
effectiveness of this stopping criterion of step 204. However, 
other stopping conditions are also within the scope of the 
present disclosure. In particular, the stopping confidence 
level may be provided as a parameter that may vary depend- 
ing, e.g., on the training set, etc. Thus, confidence levels in the 
range of 87% to 97% may be appropriate. Additionally, it is 
within the scope of the present disclosure, to determine trends 
in the autocovariance function as well as the RMSE. Note that 
these stopping criteria are evaluated on the training data as 
well as the validation data, wherein the training data is a 
subset of the data from which training progress may be 
inferred. 

If in step 144, it is determined that the disjunction of this 
step is false, then step 192 is 


(4.4) 


performed. That is, ran_flag is set to 0 for terminating the loop 
commencing a step 112, and thus terminating the process. 
where T is the number of test points. To compare the results 
provided herein with other various other approximation tech- 
nique, two forms of the Normalized Prediction Error (NPE) 
are also computed, namely, 


T (4.5) 
Sei 
izi 
NPE = = 
2 lvi - »l 
i 
and 
T (4.6) 
2,4 
i=l 
NPE: = 5 P 
ii- yy 


It is also interesting to note that the quantity ACF (h*), i.e., the 
total contribution to the autocorrelation function at lag h*, 
monotonically decreases with the number of basis functions 
and becomes very flat when the algorithm of FIGS. 1A-1C 
has converged. Hence, ACF(h*) may be used as a sort of 
statistical no progress criterion. We have compared the usual 
no progress criterion on the root mean square error with the 
idea of a no progress criterion on ACF(h*) and have found the 
latter to be more robust. Although we did not need to use 
either as stopping criteria for the example applications here- 
inbelow, it is possible they could be useful as a no progress 
criterion with other data sets. 

Thus, unlike currently available techniques for nonlinear 
function fitting over scattered data, the method disclosed 
hereinabove requires no ad hoc parameters. Thus, the number 
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of basis functions required for an accurate fit is determined 
automatically by the algorithm of FIGS. 1A-1C. After the 
introduction of a collection of new radial basis functions in 
the following section, approximation modeling to several 
illustrative problems including modeling data on manifolds 
and the prediction of financial time-series are presented, 
wherein the models are determined using the algorithm of 
FIGS. 1A-1C. Moreover, note that FIGS. 1A-1C is presented 
in the context of RBFs but in principle can be employed with 
other methods for function approximation such as multi-layer 
perceptrons. 
Extensions of the Algorithm to High-Dimensional Domains 
and Ranges 

Analgorithm for constructing nonlinear models from high- 
dimensional domains to high-dimensional ranges from scat- 
tered data is now disclosed. Similar to the univariate case 
hereinabove, the algorithm progresses iteratively adding a 
new function at each step to refine the model. The placement 
of the functions is driven by a statistical hypothesis test in 
higher dimension that reveals geometric structure when it 
fails. At each step the added function is fit to data contained in 
a higher dimensional spatio-temporally defined local region 
to determine the parameters, in particular, the scale of the 
local model. Unlike the available non-linear function fitting 
methods that leave the extension of the algorithm to higher- 
dimensional ranges as a trivial extension ofthe single-dimen- 
sional range, we provide more parsimonious models using 
inter-correlation among the successive outputs. As in the 
univariate range case, this algorithm does not require ad hoc 
parameters. Thus, the number of basis functions required for 
an accurate fit is determined automatically by the algorithm. 
These advantages extend the scope of applicability of the 
univariate algorithm to a much larger class of problems that 
arise in nature and addressed in different areas of science. A 
novel feature of present disclosure is the development of the 
statistical hypothesis test that leads to a geometrical interpre- 
tation of structure in higher dimensions. 
Testing for Structure in Multivariate Model Residuals 

Denote the set of residuals for a model of order K, as 


Rafe} ats 


where e=y-f(x,), is the m-variate residual of the n” data 
point. L is the cardinality of the training set, u is the mean 
vector E(e), and I(h)-E(e,,,,e')-uu' is the covariance matrix 
at lag h. An unbiased estimate for u is given by 


L 
z= a/b)" en. 
n-l 


A natural estimate of the covariance matrix I'(h)-E[(e, ,;, -11) 
(e, 1) ]- y)”, ,-, is given by the description hereinbelow, 
wherein the ACVF term refers to the autocovariance function 
ofthe multi-variant residuals. 


L-h 
a(h, ej) if O<h<n-1 
k=1 


Pin, 


Sl 


Km = 


if —n+1<h<0 


Similarto the univariate case we decompose the ACVF into 
its components as o(h, e,)=(e;,,,-€) (e,-e). Further more 
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ah, e, S= (e n-e) (e/,—e) is the (i, j)-component of a(h, 
ez). In other words, 


. ee Am 
yg() = Cove; ps e) = 2. alh, ei, ef). 


For a fixed lag h the quantity a(h, e,) is the contribution of the 
kth residual to the autocorrelation function. And the quantity 
a(h, e’,, &/;) is the contribution of the i and jth time series at 
kth residual to the autocovariance function. Later we focus on 
this quantity c and will illustrate that it reveals critical infor- 
mation concerning where new basis functions should be 
placed. 


The estimate of the correlation matrix function R(-) is given 
by 


A E 4.3 
RO) = Db; 0T". = [9,000,090 7 | e 


yn 
n 
Lj li, j=1 


Where y,(h) is the (i, j)-component of f'(h). If i=j, P,, 
reduces to the sample autocorrelation function of the ith 
series. Forthe asymptotic behavior and the convergence prop- 
erties of the sample mean and covariance functions see [7]. 

As was mentioned in the univariate case described herein- 
above, we seek to terminate the addition of new basis func- 
tions when the residuals appear to have no further structure. 
As a test for structure, we consider whether the residuals are 
IID. In what follows we provide the definition of multivariate 
white noise. The m-variate series {e,}, tez, is said to be white 
noise with mean O and covariance matrix X, written as 
(e, 4 -WN(0, X) ifand only ife, is stationary with mean vector 
0 and covariance matrix function 


X, ifhz0 (4.4) 
NOE : 
0, otherwise 


{e,}~IID(0, X) indicates that the random vectors {e,} are 
independently and identically distributed with mean 0 and 
variance 2. 


In general, the derivation of the asymptotic distribution of 
the sample cross-correlation function is quite complicated 
even for multivariate moving averages, [7]. The methods of 
the univariate case are immediately adaptable to the multi- 
variate case. An important special case arises when the two 
component time series are independent moving averages. The 
asymptotic distribution of p, ,(h) for sucha process is given in 
the following theorem, 


Theorem [7]: Suppose that 


e 4.5) 
Xa = Y, 0Z 1 Za} ~ HDO, oF), 


j=-0 


= (4.5) 
Xa = >) BiZ-j2 Ma] ~ UDO, 07), 
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where the two sequences {Z,,} and {Z,,} are independent, 
Zla,|<co and 21B,|<co. Then if h=0, 


ds (4.7) 
Pi) is AN |o, 7! Y" puden |. 


j=% 


If h, k=0 and hxk, then the vector (p,,(h), 9,4(h)) is 
asymptotically normal with mean 0, variances as above and 
covariance, 


V (4.8) 
n S eu(DenG € k - A. 


j= 


Without knowing the correlation function of each of the 
processes it is impossible to decide if the two processes are 
uncorrelated with one another. The problem is resolved by 
prewhitening the two series before computing the cross-cor- 
relation p, >(h), i.e., transfer the two series to white noise by 
application of suitable filters. In other words any test for 
independence of the two component series cannot be based 
solely on estimated values of the cross-correlation without 
taking in to account the nature of the two component series. 
Note that since in practice the true model is nearly always 
unknown and since the data X,, t0, are not available, it is 
convenient to replace the sequences {Z,} by the residuals, 
which if we assume that the fitted models are in fact the true 
models, are white noise sequences. To test the hypothesis Ho 
that {X,,} and {X,,} are independent series, we observe that 
under Ho, the corresponding two prewhited series (Z,,] and 
{Z} are also independent. Under Ho, the above theorem 
implies that the sample autocorrelations p (a) and p,.(k), 
hzk of {Z,,} and {Z,,} are asymptotically independent nor- 
mal with mean 0 and variances n”*. An appropriate test for 
independence can therefore be obtained by comparing the 


values of Ip, (h)I with 
-1 
1.9602. 


If we prewhiten only one ofthe two original series, say {X,, }, 
then under H, the above theorem implies that the sample 
cross-correlations p, ,(h) and p, (k), h#k, of {Z,,} and [X4] 
are asymptotically independent normal with mean 0 and vari- 
ances n”* and covariance n^! p4,(k-h). Hence for any fixed h, 
(,>(h) also falls (under Ho between the bounds 


-1 
+1.96n 2 


with a probability of approximately 0.95. 

Therefore, ifone computes the sample cross-correlation up 
to lag h and finds that more than 0.05 h of the samples fall 
outside the bound, or that one value falls far outside the 
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bounds, the IID hypothesis is rejected. This test can equiva- 
lently be written in terms of y? distribution. Given 


L-1 
a 23 
Q-Lpiiy = Lx Pia. 
jal 


it has been shown in [32] that Q has a x? distribution with L-1 
degrees of freedom. The adequacy of the model is therefore 
rejected at level a if 


Q7? L- 0. 


The discussion hereinabove of the Theorem closely fol- 
lows the presentation in [7]. 

Multivariate Incremental Algorithm 

Pseudo code of this algorithm is provided in FIG. 8. 

Although the notation is somewhat different, the main 
difference between the univariate algorithm of FIGS. 1A-1C 
and the multivariate algorithm of FIG. 8 is the statistical 
hypothesis test. The question of whether a new basis function 
should be added is answered by the IID test. We shall see that 
this test also indicates where the new basis function should be 
initialized. First we compute the autocorrelation functions of 
all m time series. If all of these pass the WN test, then the 
cross-correlations among the time series are considered. If 
there is structure in the auto-correlations or cross-correlations 
of the time series then the IID will be rejected. 

The next requirement is to determine where the new basis 
function should be located to optimally reduce the structure in 
the model residuals. We look for the point in the domain that 
makes the largest contribution to the auto or cross correlation 
which has caused the test to fail. 

This is accomplished by observing that the residuals are 
associated with the data in the domain in a one-to-one man- 
ner, i.e., there is a mapping, say y, from a data point to its 
higher dimensional residual of the form e,=\(y,,). Thus, by 
identifying the residual associated with the largest contribu- 
tion to auto or cross correlation we may identify the location 
in the domain where the basis function should be added. To 
actually find this point first we determine the exact lag for 
which the correlation function, *; (h) reaches its maximum 
value h*, i.e., 


h*=arg maxy, (h),h>0. (4.9) 


Then, we find the point in the spatial domain that has the may 
contribution to the associated ACF for lag h=h* by solving 


(4.10) 


Thus the center for the new basis function is given by 


X,*7W 1 (e;*), 
where y”? is the inverse of the function For simplicity, we will 
refer to this center location as x *. 

Now that the center of the new basis function has been 
found it is necessary to determine what data should be used to 
determine the scale and weight of the new RBF. Similar to the 
univariate case, [65,89], consider the function f, —o(h*, e’,, 
e’,). The index k is inherited from the data labels and in the 
case of a time-series corresponds to a time ordering. For 
simplicity, we assume that $”, decreases monotonically for 
both increasing and decreasing values of k until crosses zero 
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at the indices 1*<i* and r*>i*; here we use 1, r to indicate left 
and right, respectively. We now compute the distances 


dr-d(y, xu) 


and 


d,-d(y;* X, *) 
as these indicate the size ofthe data ball around the center y*. 


The subset of the data employed to update the added basis 
function is then 


Xioca AXE IX.) 


where x is the entire training set. The distance d, can be 
selected in a variety of ways and here we select 


d=max{d,d,}. 


Note that Xoca; now may contain data whose indices have 
values that are substantially different from i*, 1* and r*. 
The new term in the expansion is initialized and optimized 
similar to the univariate case. The center c, is initialized at the 
point of most structure according to our test, i.e., c¿=x*. The 
vector of widths a is very effectively initialized using the 
diagonal elements of the covariance matrix of the local data, 


ooy diago Xocal). 


Note here that W=diag(o,). The initial value for the multi- 
variate weight, o, is calculated via least squares using the 
initial values for center location and widths. Then the param- 
eters associated to the new basis function is optimized in a 
nonlinear optimization procedure. 

Similar to the univariate case we could use one of the 
statistical tests, RMSE or normalized prediction error or 
another measure of structure as stopping criteria. 

A Novel Alternative Method 

In the present section, the multivariate range values are 
considered as points in higher dimensions. We intend to use a 
test analogous to the univariate case, but extended for higher 
dimensions. FIG. 9, provides an embodiment for such an 
extension for higher dimensions. 

Note that although the notation appears very similar to the 
univariate algorithm in this algorithm we are dealing with 
multi-dimensional domain and range values. 

As an initial step to adapt a statistical test we employ x? 
with m?-1 degrees of freedom for each lag. Note that both the 
nature of the hypothesis test as well as the number of degrees 
of freedom employed may be varied. 

The details of the algorithm for the determination of the 
local ball (i.e., the domain neighborhood), initialization ofthe 
center, width and weight of the new RBF are given in FIG. 9. 
Modulated Asymmetric Radial Basis Functions 

In the present section, a new class of functions is presented 
that can be used as RBFs and show their flexibility in data 
fitting over high dimensional domains. This includes non- 
symmetric RBFs with compact or non-compact support. The 
developed RBFs are well suited for a wide range of training 
algorithm that is being used for data fitting, [21]. We refer to 
this extended class of functions as modulated asymmetric 
RBFs or, alternatively, skew RBFs. 

In general, model order determination via both regulariza- 
tion and growing and pruning algorithms can be computa- 
tionally intensive and data hungry, however the right choice 
of RBFs makes this job substantially easier. Hereinabove we 
have observed that the nature of the condition number asso- 
ciated to an RBF model depends very significantly on the type 
of RBFs and the number and scale of representation func- 
tions. The advantage of modulated asymmetric RBFs is that 
they are able to better fit the geometric structure ofthe data at 
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each step during the training procedure. For certain types of 
data, in particular data which is heavily skewed such as 
boundaries, or edges, in images, modulated asymmetric 
RBFs afford more parsimonious models than their non- 
modulated RBF counterparts. 

Carefully observing data fitting using the non-modulated 
symmetric RBFs, indicates that there is significant room to 
improve on the way these functions match the structure of the 
data. We present a simple example to illustrate this idea. For 
purposes of illustration, we have created a time series from a 
single mode skew Gaussian with Gaussian IID noise with 
standard deviation of 0.01. The training and testing data sets 
are shown in FIG. 10A, wherein the testing and training data 
sets for the skewed data set were generated with the param- 
eters, c=2, t--7, a=1, o=1. And the output ofthe single mode 
model using modulated RBFs. The fit using modulated Gaus- 
sian-Gaussian RBFs via the algorithm given hereinabove, is 
shown in FIG. 10B. The skew Gaussian fit is done only with 
one RBF and the RAISE of the final fit is 0.0029. The algo- 
rithm terminates with 98.8% of confidence. The fit using 
regular Gaussian RBFs is shown in FIG. 11A. In this case, as 
shown in FIG. 11B, the fit requires 13 RBFs and RMSE of the 
final model is 0.0035, while 96.80% of confidence was 
achieved at termination of the algorithm. 

Definition of Modulated RBFs 

Based on the observations made in the previous section, if 
the data being fit is asymmetric, then breaking the symmetry 
of the basis functions may produce models of lower order and 
potentially higher accuracy. For example, fitting a discontinu- 
ous Heaviside function with symmetric radial basis functions 
gives rise to a Gibbs type phenomenon that is not present with 
appropriately skewed RBFs. Modulated RBFs may be gen- 
erated by multiplying a non-symmetric scalar function (that 
possesses multiple shape parameters) to any form of symmet- 
ric RBFs. This basic idea is reminiscent of skewed distribu- 
tions studied in probability theory but is more general in that 
we do not require the shape function to be a cumulative 
distribution function nor the symmetric RBF to be a probabil- 
ity distribution function. Note that it is also distinct from 
asymmetric RBFs referred to as normalized RBFs which are 
motivated by probabilistic considerations and do not afford 
the same variety of shape parameters. 

The idea of modeling skewed distribution functions in 
statistics can be traced back to 1908, [82], where perturbation 
ofthe normal density via a uniform distribution function leads 
to a form of skew-normal density. Although it is mathemati- 
cally somewhat different from the form that is presented in 
current literature, its underlying stochastic mechanism is inti- 
mately related. Fundamental skew-symmetric distributions 
are studied in [53]. For specific references on skew Cauchy 
distributions, see [66, 85, 54], for skew t distributions, [96, 
59], skew-logistic distributions, [141], and skew-elliptical 
distributions, [70]. We would like to concentrate on the spe- 
cific formulation of skew-normal distribution. The formal 
definition ofthe univariate skew-normal (SN) family is due to 
Azzalini [56]. A random variable Z has an SN distribution 
with skewness parameter A, and is denoted by Z~SN(A), if its 
density is f(zlA)=29(z)P(Az), with z is a member of R, A is a 
member of R. $ and ® are pdf and cdf of N(0, 1), respectively. 
The case where A=0, reduces to N(0, 1). Further probabilistic 
properties of this distribution are studied in [56, 57]. The 
multivariate SN family of densities are introduced in [60] 
which is given by f(zlA)=2,(z)®, (Az), z is a member of R^, 
A is a member of R*. $, is the probability density functions of 
k-dimensional normal distribution, N¿(0, 1). Similar to what 
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is mentioned above the case where A=0 corresponds to N,(0, 
Ij). Further properties ofthe multivariate SN distribution are 
studied in [58]. 

A closer look at two types of multivariate skew-normal 
distributions is provided, and connections are drawn to RBFs, 
[92]. They are: 


Sos p, E, D) = "at i, DO [DO - 0] 


1 
®,(0; 1+ DED’ 


where ue R^, 2>0, D(pxp), $C; u, E) and 6, (., 2) denote the 
pdf and the cdf of p-dimensional symmetric distribution with 
mean u and covariance matrix 2>0, respectively. Note that in 
this case we have a p integral with upper bounds of D(y-), 
[80]. 


F 01:926, (1:1,)0, [A7 (y-w)] 


where A is a vector of length p and ®, is the one dimensional 
cdf of the given distribution. In other words 


To [60] 
$, [A (y - 1] - f oi (x; p, E) dx, . 


The latter formulations is used to motivate modulated 
RBFs. 
Let 


n 


w= >) med, 


izl 


where 6° .60)-s, (x. Ho WC 1w,), and 


Tr 
Sp(X, Hi, w= f pily)dy 


with 


s, C) is the asymmetric component of the RBF. The param- 
eters œ, o=diag(W,), i, and W, are learned from the data. 
Note that RBFs do not require the normalization factor given 
in their analogue probability distributions. 

To generate modulated RBFs we could combine different 
cdfs and pdfs of variety of distributions. For examples, Gaus- 
sian-Cauchy RBFs, Cosine-Sine RBFs, Cosine-Cauchy 
RBFs, and many others. All we need is a nonlinear modulator 
that can produce flexibility. 

FIGS. 12A, 12B, and 12C, respectively show the behavior 
of the one-dimensional Gaussian-Gaussian, Cauchy-Cauchy, 
and Gaussian-Cauchy RBFs. The skew parameter ranges 
from -10 to 10 with the increments of one. FIGS. 13A and 
13B show two dimensional Gaussian-Gaussian and Cosine- 
Cauchy RBFs and their associated contour plots. 

It is important to note that compactly supported modulated 
RBFs can be generated using compactly supported RBFs 
introduced hereinabove. 
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We propose a class of RBFs that are symmetric with adap- 
tive curvature. The flexibility in the curvature produces a 
better fit to data. Note that some of these classes remain 
positive definite for different values of the curvature param- 
eter. 

Modulated RBFs that are Positive Definite 

Traditionally, RBFs are developed in the context of inter- 
polation. Here we would like to show that it is possible to 
generate modulated RBFs that produce positive definite inter- 
polation matrix. We pursue an approximation theoretic solu- 
tion to the problem. We start with a positive definite function 
and show that up to certain limits for the modulation param- 
eter the interpolation matrix remains positive definite. We 
utilize ideas form perturbation theory and the lower bounds of 
the inverse of an interpolation matrix [69, 118, 128, 133, 114, 
117, 129]. 

5.3 Impact on Other Type of Networks 

We have also considered the connections to other type of 
networks, e.g., support vector machines. The support vector 
machines (see e.g., [139]) are connected to radial basis func- 
tions viathe RBF Kernels [132]. At the heart ofthe SVM there 
is an inner product which could be computationally expen- 
sive. Based on Mercer's theorem [113], the positive definite 
inner product kernel's are used to facilitate this work. In this 
study we would like to introduce modulated Kernel SVMs. 
Clearly modulated RBFs possess the property that K(y, X )-K 
(x', X) and we have shown the bounds in which the given 
kernel remains positive definite. For relation between RBFs 
and multilayered perceptrons see, e.g., [11, 88]. The method 
of mixture models immediately benefit from this structure if 
it assumes the basis functions to be modulated. 

5.4 Impact on Signal and Image Processing 

To show the promise of these RBFs we look at Mackey- 
Glass time series, which produces a map from R* to R. In 
[89] we report 76 RBFs to get the 9596 confidence with 
RMSE of 0.0116. The same confidence with a similar error 
rate is achieved with 42 modulated Gaussian RBFs. 

The numerical experiments suggest that the proposed 
radial functions significantly extend the scope of currently 
used RBFs and improve the computational cost as well as the 
the complexity of the models. It is important to note that the 
modulated RBFs are capable of producing models with 
smaller error even if the the model based on symmetric RBFs 
has the same order as the model based on asymmetric RBFs. 
We intend to explore the capability of these innovative RBFs 
in the context of modeling data on manifold and prediction of 
financial time-series using the algorithms proposed herein- 
above (i.e., FIGS. 1A-1C, 8, and 9). 

Image Reconstruction Application 

Also within the scope of the present disclosure are appli- 
cations of these RBFs in the context of image reconstruction, 
or, more generally, where the data is spatial and there is no 
time parameterization. For this purpose, it is believed that the 
following algorithm is useful. 

Algorithm À Proposed Algorithm for Image Reconstruc- 
tion. 

Identify the edges using a simple edge detector. 

Find the most dominate edge. 

Adapt a modulated RBF to a patch of data around the 
identified point. Repeat the procedure till a satisfactory result 
1s achieved. 

This algorithm differs from that described in FIGS. 1A-C 
in that there is no time ordering of the data points. Of course 
one may artificially endow a set of data with a time-param- 
eterization by connecting the points arbitrarily by a curve. 
Alternatively, it is useful to identify points in the spatial image 
that have the most structure and these should be candidates 
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for adding basis functions. In image processing a measure of 
structure is the image gradient which reflects the presence of 
an edge when it is large. Thus, one embodiment of the pro- 
posed algorithm proceeds by iteratively placing functions 
where the image gradient is a maximum. This process con- 
tinues until the image gradient at all points on the image is 
sufficiently small. The size of small is application dependent. 
Of course the IID test may be employed in conjunction with 
this algorithm. 

More generally, one may use other measures of geometric 
structure to guide the placement of basis functions. 

Since the above algorithm fits the structure of the data and 
does not capture the noise, by design, this approach is well 
suited for image de-noising as well as image compression. 
Another Variation (Symmetric) on RBFs 

An RBF based on subtraction of two log-sigmoid functions 
is used in [111], to generate localized robust RBFs. In [55], an 
extension of the robust RBFs is presented that uses a com- 
posite product of log-sigmoidal functions to make localized 
RBFs. In [98], and related prior work on reformulated RBFs 
aim to facilitate training by supervised learning based on 
gradient descent. The approach is based on selecting an 
admissible generator functions. For example, for in 71, the 
exponential generator function g,o(x)=exp(P,x), P,>0, corre- 
sponds to go =exp Bx 1-m) which leads to Gaussian RBF 


2 
(x) = g (2) = exp] —- 
J J cj d 


with 


Linear generator function g,.=a,~+b,, 0/70, 
RBFs of the form 


b,=0, produces 


1 
pja) = gj") = (ajx? + bj), 


with m>1. For m=3, this corresponds to the inverse multiqua- 
dratic RBF, 


Nie 


2 
= 8,2) =—— 
$j = Bj) ug 


In [77] a certain class of oscillatory radial functions are pro- 
posed as suitable candidates for RBFs that lead to non-singu- 
lar interpolants with the feature that the scaled version 
becomes increasingly flat. The aim is to generalize traditional 
spectral methods to completely general noise layouts. The 
RBF has the following form 


Jg (em) 
par) = ~——. 
em 
d=1, 2,..., where J¿(r) denotes Bessel function of the first 


kind of order a. These RBFs will give nonsingular interpola- 
tion up to d dimensions and d=2. 


US 8,521,488 B2 


33 


Additional New Radial Basis Functions 

Most applications employ activation or basis functions 
from a relatively small list, including Gaussians, multi-quad- 
rics and thin plate splines. Recently several functions with 
compact support have proposed as candidate RBFs, see, e.g., 
[143, 142, 144]. For example, the C? function 


$()- (0-147), (5.1) 


has been derived as an RBF explicitly for domain dimension 
4 in the sense that the resulting square interpolation matrix is 
a (conditional) positive definite matrix [143]. In many cases 
of practical interest it appears that this interpolation condition 
is overly restrictive. In general, data fitting problem one is 
usually confronted with solving an over determined least 
squares problem. In this setting it seems adequate to require 
only that the approximating basis functions be dense in an 
appropriate function space. As described in [121], the condi- 
tions required of basis functions to be dense in L" (R") are 
very weak. We introduce several new candidate compactly 
supported RBFs for approximating functions in L" (R") via 
over-determined least squares. 

First, in [23], we propose the bump function widely used in 
differential geometry 


(5.2) 


pr) = ex( Jura -n) 


Poy 


for use as an RBF activation function where H is the usual 
Heaviside step function. This compactly supported and infi- 
nitely differential function is also widely referred to as a 
mollifier. It is shown in FIG. 2A, and is qualitatively similar in 
nature to the widely applied non-compact Gaussian RBF, 
exp(-r)). Interestingly, the failure of the Gaussian to have 
compact support has led some researches to arbitrarily trun- 
cate it. We observe that the Gaussian RBF satisfies the posi- 
tive definiteness of the interpolation matrix for all space 
dimensions d=1. Note that while the mollifier function sat- 
isfies the postulates of Park and Sandberg’s theorem, it has 
non-positive values in its Fourier transform and hence does 
not satisfy Wendland’s interpolation criterion for a compact 
RBE [69a, 142]. 

A compact activation function with constant curvature is 
provided by 


en V I- HQ. 


This is just the quarter circle shown in FIG. 2B. Clearly this 
function satisfies the postulates of Park and Sandberg's theo- 
rem. 

Our last proposed activation function in [23] with compact 
support is the Hanning filter 


(5.3) 


o(r)=(cos(r)+1)H(1-r). (5.4) 


Like the bump function, this function is also infinitely differ- 
entiable; see FIG. 2C. It has advantages over the mollifier 
function in the manner in which the function approaches zero, 
i.e., there is no vanishing term in a denominator. 

While the quality of performance of a model can be 
assessed using a variety of directions (e.g., regularization 
methods and cross validation) the inherent conditioning of the 
model plays a critical role in its ability to generalize. In 
practice, if the data model is represented generally by the 
mapping y=f(x), of importance is how the output of the 
model changes as a consequence of perturbation of the input. 

For nonlinear mappings, such as those generated by multi- 
layer perceptrons, the estimation of the condition number is 
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complicated by the fact that the Jacobian of the map must be 
estimated at every point of interest [105]. This is also true in 
general for RBFs. However, in the case of RBFs we can 
determine the condition number associated with the pertur- 
bation of the parameters simply by computing the singular 
values ofthe interpolation matrix. This information provides 
an important measure ofthe sensitivity of the approximation 
model. 

We have observed that the nature of the condition number 
ofthe interpolation matrix (here we mean both the interpola- 
tion problem where this matrix is square and the overdeter- 
mined least squares problem where this matrix is tall) 
depends very significantly on the type of RBFs that are 
employed. The proposed three compactly supported func- 
tions (5.2)-(5.4) above possess especially good conditioning 
properties. We illustrate their utility on the benchmark 
Mackey-Glass time series data, [48, 36, 25, 47], in [23]. The 
Mackey-Glass time series is a mapping from a time-delay 
embedding ofthe univariate time-series to a future value. The 
Mackey-Glass time-delay equation 


ds(t) e 


s(t — T) 
457 —bs(t) - a 


1-s(r- o 


generates a chaotic time series with short-range time coher- 
ence, where long time prediction is very difficult; it has 
become a standard benchmark for testing model fitting algo- 
rithms. Further details regarding the Mackey-Glass time 
series is provided in the section titled *Mackey-Glass Time 
Series" hereinbelow. 

As a measure the performance of the various RBFs on the 
Mackey-Glass time series, we compare the root-mean-square 
error (RMSE), the number of basis functions required and the 
sensitivity of the models via the condition number of the 
interpolation matrix of the full model. We present the final 
result ofthe fit using the mollifier in FIG. 2A. In this figurethe 
output of the model and the associated target values are pre- 
sented. The results of using other RBFs are summaries in 
Table 5.1. Note that all the results are aimed for 95% confi- 
dence in the statistical test [3, 4]. 


TABLE 1 


This table shows the performance of 
different RBFs under identical strategy of fit. 


Wendland RBF Circle RBF — Mollifier 
ConditionNumber 3.0057e+003 12.5845 284.3114 
RMSE 0.0109 0.0344 0.0167 
NumberofRBFs 51 26 37 
Confidence % 95 95.27 95.53 


Additional Numerical Examples. 

This section presents several additional applications to 
demonstrate the performance of the algorithm of FIGS. 
1A-1C in higher dimensional domains (e.g., algorithms such 
asthose disclosed in FIGS. 8 and 9). The successful extension 
of this algorithm from one to higher dimensional domains 
requires the introduction of the notion of a space-time win- 
dow; 1.e., the spatio-temporal ball described hereinabove. 
Here we illustrate the impact of this concept on several appli- 
cations. Note that throughout all the following examples the 
same code was employed, in particular, there were no param- 
eters that were adjusted or tuned to the data set. 
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A Simple Manifold Example. 

In the present example a representation of data on a mani- 
fold as the graph of a function is provided for modeling. In 
particular, a graph 304 of the pringle data set is shown in FIG. 
3A, named as such given its similarity to the boundary of a 
potato chip by the same name; see also [9, 8]. The task is to 
construct a mapping from an (x, y) value in the plane to its 
corresponding z value on the pringle data set. Thus, we are 
attempting to fit or approximate the graph of a function from 
R? to R. Such graph fitting problems are at the center of the 
Whitney's manifold embedding theorem where 2m+1 dimen- 
sional domains suffice (in general) to write m dimensional 
manifolds as graphs; see [9, 8] for a discussion. 

Note that the pringle data set, as proposed in [10], can be 
generated as the solution to the following systems of ordinary 
differential equations 


dx 

dt LE 
dy 
dt 
dz 


t 


--x-6 ey -DY 


= -Az + Lay + wo? — y)), 


where y and wm are parameters. In FIG. 3A, a numerically 
integrated trajectory of an attracting cycle (308) is also 
shown, as one skilled in the art will understand. In this 
example, we are only concerned with fitting data on the limit 
cycle and ignore transients; i.e., points off of the limit cycle. 
From the fitted data of FIG. 3A, the data points of FIG. 3B 
were generated by corrupting the fitted data with Gaussian 
noise with standard deviation of 0.1, wherein there are 54 data 
points per cycle about the pringle graph 304. In particular, 
FIG. 3B shows the training set having of 101 points (almost 
two cycles) and testing data set consisting of 500 points, or 
almost 9 cycles. The fact that the solution is periodic will 
clearly illustrate the need for spatial as well as temporal 
windowing of the data. The algorithm of FIGS. 1A-1C is 
capable of learning a specific part of the trajectory with a 
small amount of data and generalizes well to the data that 
resides in the same region. 

FIGS. 4A through 4D show the ACC;* functions for the 
four major radial basis functions that capture the underlying 
structure of this data set. The diamonds in these figures indi- 
cate the points in the ACC,* function that contribute to the 
RBF at the corresponding stage, i.e., they belong to x; ,..,,;. In 
FIG. 4A we see that the spatio-temporal window collects data 
from two peaks indicating that we have cycled through the 
data twice in that region. This example clearly illustrates the 
difference between spatio-temporal windowing and temporal 
windowing: a time window would only use data from one 
cycle. We see the same effect in FIGS. 4B through 4D. 

FIGS. 5A through 5D show the location and shape of the 
four RBFs that are generated by the algorithm of. FIGS. 
1A-1C when modeling the data set of FIG. 3B before the ID 
stopping criteria described hereinabove is satisfied. In each of 
the figures, the training data (1.e., *;,,,;) is identified by a 
corresponding one of the jagged graphs 504a through 5044. 
Additionally in each of the figures, a graph (identified by the 
labels 508a through 5084) of the corresponding RBF gener- 
ated by the algorithm of FIGS. 1A-1C is displayed. 

FIG. 6A shows the maximum values of the corresponding 
ACC;* function for each step in the training process of FIGS. 
1A-1C (.e., steps 112 through 212). 
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FIG. 6B shows the performance of the model in the RMSE 
sense as the number of assigned RBFs increases while FIG. 
6C shows the confidence level at each stage of training. Note 
that the four RBFs model the data with RMSE of 0.1187 and 
9996 of points in the autocorrelation function resides in the 
95% confidence bands. Also note that neither the RMSE nor 
the values of y(h*) provide reliable stopping criteria in this 
example. FIG. 7 shows a plot ofthe output ofthe approxima- 
tion model generated using the four RBFs described FIGS. 
4A through 6C together with the target values of the testing 
set. 

In contrast to [89], in the present disclosure the notion of 
Atoca¡ Has been modified in such a way that it includes the data 
points between the two zero crossing of the ACC function, as 
one of skill in the art will understand. The reason for this 
modification is due to the fact that there is structure between 
the two places that the ACC(h) departs from the null state and 
there is less structure in the places that the amount of the 
ACC(h) is less than a certain value. There is a modification to 
the optimization routine. Additionally, in one embodiment, 
we have implemented the alternative decent directions 
method which only optimizes one parameter in each iteration. 

In addition, to avoid the overflow effect of the RBFs to the 
regions that are not in the local ball, in one embodiment, 
constraints in the optimization procedure may be imple- 
mented such that it penalizes the growth of RBF when 
increasing the error on the complement of the local space- 
time ball. 

Mackey-Glass Time Series. 

The Mackey-Glass Time-Delay Equation: 


s(t — T) 
ls(p-7)9* 


ds) _ (5.1) 


dt 


—bs(t)+a 


generates a chaotic time series with short-range time coher- 
ence, where long time prediction is very difficult; it has 
become a standard benchmark for testing model fitting algo- 
rithms [36, 25, 47]. 

The time series is generated by integrating the equation 
with model parameters a=0.2, b=0.1 and t=17 using the 
trapezoidal rule with At=1, with initial conditions y(t-t)=0.3 
for 0<t<t (t=17). The initial 1000 data points corresponding 
to transient behavior are discarded. Then 4000 data points are 
reserved for the training set. The test set consists of 500 data 
points starting from point 5001. Note that not all 4000 train- 
ing points collected were actually used for training the model. 
(These conditions are very similar to those in Platt [36].) 

For purposes of comparison with [48], the series 1s pre- 
dicted with v=50 samples ahead using four past samples: s,,, 
S, 5; 4, 15 aNd 4, ,&. Hence, we require that the model fit the 
input value 


La Sn S6 Sn 12/5518) 


to the output value s,,,,,. The y step prediction error is then 
€—$, 47 (S). S, 5. Sn-12> $, 18). As such, this time series pro- 
vides a good example for illustrating the construction of a 
nontrivial mapping from R* to R. 

FIG. 14A shows the initial ACF(h) values (computed on 
the training data; 1.e., step 140 of FIGS. 1A-1C) plotted 
against h, while FIG. 14B shows the final ACF ofthe residuals 
(used in steps 144 through 188 of FIGS. 1A-1C) that indicates 
that the model fitting process should be terminated given 95% 
confidence has been achieved. FIG. 14C and FIG. 14D show 
the associated ACC functions, i.e., the point-wise values f,, 
corresponding to the maximum value of the ACF in FIG. 14A 
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and FIG. 14B, respectively. From FIGS. 15A through 15C we 
see it is sufficient to use only 76 centers to get the 9596 
confidence fit for the Mackey-Glass data set with a resulting 
RMSE of 0.0116 (See FIG. 16, wherein the output of the 76 
mode model for the testing data set appears to fit the actual or 
target values very well. This example is interesting in that a 
large number of modes are required to attain the stopping 
criterion). 

The algorithm of, e.g., FIGS. 1A-1C, based on space-time 
balls, provides a result similar to MRAN [47] (RMSE of 
0.035) using 1500 data points with 21 centers. However, at 
this level of RMSE, both our algorithm (21 modes) and 
MRAN (24 modes and 4000 data points), produce sporadic 
but significant overshoots and undershoots of the function in 
regions of high gradient. These large pointwise errors are 
hidden to some degree by a relatively small RMSE. The IID 
test is, of course, point-wise and reveals local un-modeled 
structure in the data and prevents the algorithm from termi- 
nating prematurely. 

Yet, one might argue that stopping at 9596 confidence and 
76 modes is still premature stopping since a slightly improved 
final RMSE value of 0.0090 on the test data is achieved with 
109 modes (but then does not improve with more). However, 
this example is for the special case of noise-free data. In such 
instances it is recommended that the IID test be coupled with 
the RMSE test to draw optimal conclusions, unless, of course, 
one chooses to add noise artificially to the data. Given how 
close the RMSE errors are at 76 and 109 modes one must 
seriously consider that even in this case the 9596 confidence 
level is arguably superior. 

Time Series Prediction Using Exchange Rate Data Set. 

The data set for the present example consists of daily 
values of the Deutsche Mark/French Franc exchange rate 
over 701 days; FIG. 17 shows a graph of this data set. As 
mentioned in [30], this data set has irregular non-stationary 
components due to government intervention in the Europe 
exchange rate mechanism. Following [30], since there can be 
*day of week" effects in such data, a window of 5 previous 
values can be used as input, giving a data set of 696 patterns. 
Hence, this data set forms an interesting example of a map- 
ping from R? to R. 

The training data for the approximation model generated 
was taken to be the first 600 data points. The test data set was 
taken to be the last 96 data points. FIG. 18 shows the output of 
the resulting model (1-step prediction values) and the target 
(market) values for the test data. The modeling process ter- 
minated with a model of order three when the 9596 confidence 
threshold was attained (actually 97.17%). The ACC and 
RMSE criteria are also in agreement with the model order of 
three; see FIGS. 19A through 19C. The 3 mode model pro- 
duces the RMSE value of 0.0043, NPE,=0.2760 and 
NPE,=0.1033. The model has centers at (3.4933, 3.9292, 
3.2870, 3.8574, 4.0983), (3.2793, 3.3475, 3.3337, 
3.18433.2718) and (3.3666, 3.4187, 3.6620, 3.2056, 3.6457) 
with widths (0.4501, 2.7037, 2.2175, 2.5672, 2.9234), 
(0.1136, 0.1336, 8.5380, 0.6561, 0.5541) and (0.0555, 
0.0358, 0.1740, 0.1939, 0.4015), and weights 3.8595, 0.5751 
and 1.3805 respectively. 

FIGS. 20A through 20C highlight the patterns in the ACC 
functions associated to the maximum contributions of the 
ACFs in the process of adding the first 3 main RBFs. Note 
again, the need for the spatio-temporal window as evidenced 
in FIGS. 20B and 20C. FIG. 20B shows two distinct time 
regions contributing to the local data indicating either a peri- 
odic or quasi-periodic behavior. FIG. 20B illustrates the util- 
ity ofthe space time ball. The diamonds around points 150 to 
200 occur is a single time window whereas the diamonds 
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around points 420 to 480 (approximately) belong to another 
time window. However, all these points taken together form 
the local space-time ball. The resulting data set is more com- 
plete and affords better approximations. Moreover, a time- 
local windowing procedure such as used in prior art does not 
capture this global structure in the data. 

The embodiments of the various methods disclosed here- 
inabove for generating approximations of data sets may be 
applied to various fields of endeavor. In addition to the fields 
of financial analysis and image reconstruction described 
above, applications of FIGS. 1A-1C, 8 and/or 9 may be 
applied to simulations of: 

dynamic systems such as fluid dynamics of an airflow 
around a physical object, ora water flow over a boat hull, 
etc., wherein a training data set may be, e.g., sensor 
measurements at various points in time and space during 
a test in, e.g., a wind tunnel. Thus, the approximation 
model generated according to the present disclosure 
may be used to interpolate and/or extrapolate between 
and/or beyond the training data set; 

reconstruction of complex signals such as radio signals in 
noisy environments, e.g., the training data set may be 
measurements of noisy radio signal carrying a particular 
information content; 

target recognition (e.g., missile, aircraft, water and/or 
ground vehicle recognition), e.g., the training data set 
may measurements of heat signatures of targets, optical 
spectral measurements of paint on a target, optical sil- 
houette of a target; 

an automatic guidance or targeting system of a mobile 
device, such as an unmanned aircraft, rocket, or other 
vehicle, wherein the training and testing data sets may 
be, e.g., indicative of previous operations (successful 
and/or otherwise) of the mobile device; 

an automatic guidance or targeting system of a stationary 
device, such as stationary missile launcher, wherein the 
training and testing data sets may be, e.g., indicative of 
previous operations (successful and/or otherwise) ofthe 
stationary device; 

a real time physical process being modeled adaptively in 
(near) real time; such a process may be, e.g., data from a 
heat exchanger related to heat transfer efficiency, or real 
time stock market price fluctuations, wherein the train- 
ing and testing data sets may be, e.g., indicative of pre- 
vious operations (successful and/or otherwise) of the 
physical process; 

technical financial modeling and/or prediction, such mod- 
eling may be similar to the analysis of the daily values of 
the Deutsche Mark/French Franc exchange rate 
described hereinabove. However, other types of finan- 
cial analysis are within the scope of the present disclo- 
sure such as stock market analysis (particularly stocks 
for a particular market sector), bond market analysis, 
etc.; 

failure prediction of, e.g., a complex system such as a 
chemical plant, a munitions plant, a space station, etc., 
wherein the training data set is measurements from a 
plurality of sensors for monitoring conditions that relate 
to the safety and/or efficacy of one or more processes 
being performed; 

market research, wherein the training data set includes 
interview responses from a population sample, and 
wherein the resulting approximation model may be used 
to, e.g., provide suggestions to a particular product or 
service customer as to the options the customer might 
find desirable. 
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Note that since there are no ad hoc parameters that require 
setting in order to generate an appropriate model, an embodi- 
ment of one or more of the approximation techniques of 
FIGS. 1A-1C, 8 and/or 9 may be provided on a network such 
as the Internet, wherein a user need only provide a data type 
(or data structure) characterization of the training and testing 
data sets. Accordingly, once such a data characterization is 
provided, then the training and testing data sets may be pro- 
vided for analysis by a user having substantially no back- 
ground in modeling, simulation, approximation theory and/or 
artificial intelligence. FIG. 21 shows an illustrative embodi- 
ment of a system for providing modeling services according 
to one or more of the processes of FIGS. 1A-1C, 8, and/or 9. 
Accordingly, an approximation server 2104 may be accessed 
by users at various user stations 2108, or directly as shown. In 
particular, a user may: 

(1) request modeling services from the server 2104 via the 
user interface 2112, wherein this user interface may 
include a capability of connecting to the network 2116, 
or alternatively, the user interface may allow a user to 
interact directly with the server 2104; 

(i1) supply data structure information for data to be mod- 
eled, e.g., training data or testing data, e.g., such format- 
ting may indicate a maximum number of digits in each 
input value, the maximum number digits in model data 
output to the user, the dimension of the input data, the 
dimension of output model data, etc.; 

(iii) input training data; 

(iv) input testing data; 

(v) input control information such as the maximal number 
of basis functions to use, optionally, the type of basis 
functions to use, and/or optionally, a stopping condition 
for indicating when a model being generated is suffi- 
ciently accurate; 

(vi) receive reports related to, e.g., the accuracy of a model 
generated on the training data, such as RMSE. 

Accordingly, at a high level a user first request to initiate a 
modeling session with the server 2104, wherein the user 
interface 2112 contacts the controller 2120 for initiating the 
session. Since the user data repository 2124 contains identi- 
fication of each user as well as modeling data (both training 
and testing data) for each model (to be) generated, the con- 
troller 2120 accesses the user data repository 2124 for iden- 
tifying the user, for initiating a new modeling session (e.g., 
with new data), for continuing a previous modeling session. 
Accordingly, the controller 2120 may input training and/or 
testing data into the user data repository, wherein such data is 
identified both by, e.g., a user identifier and an identifier 
indicative of the model (to be) generated. After at least the 
training data for a particular model is stored in the user data 
repository 2124, the user whose identification is associated 
with the training data may request that the controller 2120 
activate the approximation model generator 2128 for gener- 
ating a model, e.g., based on the pseudo code of FIGS. 1A-1C, 
8 or 9. Note that as an option, a user may provide control 
information such as the type of basis functions to use (e.g., 
regular Gaussian RBFs, or Gaussian-Gaussian RBFs, RBFs 
based on the mollifier function or the quarter circle), and/or a 
stopping criterion (e.g., a confidence level of 9596 or 9896, or 
a RMSE of some particular value). However, it is within the 
scope of the present disclosure that such options can be auto- 
matically activated in various combinations by the controller 
2120 upon receiving feedback from the approximation model 
generator 2128. Additionally, note that the generator 2128 
may perform utilize parallel computing techniques for gen- 
erating a model. For example, each of a plurality of proces- 
sors (CPUs) may beactivated for determining one ofthe basis 
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functions of the model on a particular X;,,,, as described 
hereinabove. Note that in one embodiment, different proces- 
sors may perform the methods described herein (e.g., those of 
FIGS. 1A-1C, 8, and/or 9), wherein the optimization tasks are 
performed according to different cost functions, e.g., differ- 
ent norms such as an L, norm, L, norm, or L. norm. 

Accordingly, once the generator 2128 has completed an 
instance of training, the generator 2128 outputs the data 
defining the generated model to the approximation models 
database 2132 together with various other types of informa- 
tion such as the final ACF, the final RMSE, a confidence level 
(e.g., as described hereinabove) the number of basis functions 
used in generating the model, etc. Moreover, the user may be 
notified of such results. 

As indicated above, the user may also input testing data. 
Such testing data is also associated with the particular user 
and model to which such testing data corresponds. Accord- 
ingly, the controller 2120 may activate an approximation 
model analyzer 2136 performing various statistical tests 
regarding how well the model conforms to and/or predicts the 
testing data. 

Output from the model generator 2128 and/or the approxi- 
mation model analyzer 2136 may include various statistics 
and/or graphs related to how well the generated model con- 
forms to the training data. In particular, one or more of the 
various graphs and/or statistics illustrated in FIGS. 4A-4D, 
5A-5D, 6A-6C, 7, 10A-10B, 11A, 11B, 12A-12C, 13A, 13B, 
14A-14D, 15A-15C, 16, 17, 18, 19, and/or 20. Note that the 
data for such output may be stored in the approximation 
models database 2132, and associated with the data for defin- 
ing the corresponding generated model (e.g., the list of RBFs 
instantiated in the generated model). Moreover, note that the 
user data repository 2124 and the approximation models data- 
base 2132 can be combined into a single data repository as 
one skilled in the art will understand. 

The foregoing discussion of the invention has been pre- 
sented for purposes of illustration and description. Further, 
the description is not intended to limit the invention to the 
form disclosed herein. Consequently, variation and modifica- 
tion commiserate with the above teachings, within the skill 
and knowledge ofthe relevant art, are within the scope ofthe 
present invention. The embodiment described hereinabove is 
further intended to explain the best mode presently known of 
practicing the invention and to enable others skilled in the art 
to utilize the invention as such, or in other embodiments, and 
with the various modifications required by their particular 
application or uses of the invention. 

What is claimed is: 

1. A method for facilitating voice recognition using a mod- 
eling relationship between first and second data collections, 
wherein for each member of the first data collection, there is 
a corresponding member of the second collection, compris- 
ing: 

determining residuals between the first and second collec- 

tions; 

determining, using the residuals, a position of one of the 

members of the first data collection; 

determining proximity data relative to a value (V) for the 

one member of the first data collection at the position; 
determining a subcollection ofthe first collection, wherein 
each member of the subcollection has a value that is in 
proximity to the value V according to the proximity data; 
generating a basis function by a computer from the subc- 
ollection for obtaining a model of the relationship; and 
outputting model information for presentation to a user or 
a predetermined process for affecting or identifying a 
physical event, wherein the model information includes 
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at least one of: (a) data indicative of a correspondence 
between the model and the relationship, (b) data indica- 
tive of a variance between the model and the relation- 
ship, (c) an extrapolation ofthe relationship, (d) an inter- 
polation of the relationship, (e) multi-dimensional 
model output, and (f) notification of the physical event 
and wherein an approximation function is modulated, or 
skewed, by a shape function; and facilitating voice rec- 
ognition by using the modeling relationship between the 
first and second data collections. 

2. The method of claim 1, wherein the approximation func- 
tion is selected form the group consisting of basis functions, 
multi-layer perceptrons, and feed-forward neural networks. 

3. The method of claim 1, wherein the steps of claim 1 are 
iteratively performed for generating a model of the relation- 
ship, and with at least most iterations, a step of determining 
structure or information content in the residuals. 

4. The method of claim 3, wherein for each of at least most 
ofthe iterations, the corresponding instance of the subcollec- 
tion is determined accordingly to an instance of the position 
determined as a function of a plurality of autocorrelation 
components. 

5. The method of claim 3, wherein the second collection 
has a dimensionality greater than one, and the step of deter- 
mining the position includes, for each of at least most of the 
iterations, performing a test on the residuals for determining 
whether or not the steps of claim 1 are to be performed again. 

6. The method of claim 5, wherein the test includes using 
the residuals to determine an error that is a function of more 
than one of the dimensions of the second collection. 

7. The method of claim 1, wherein the step of determining 
the position includes performing an autocorrelation for inde- 
pendent identically and distributed in the residuals. 

8. The method of claim 1, wherein the approximation func- 
tion is a basis function, wherein the step of generating 
includes determining at least one parameter value for the 
basis function by iteratively adjusting the parameter value for 
reducing a result of a function dependent upon a difference 
between aninstance ofthe basis function, and members ofthe 
second data collection that correspond to the members of the 
subcollection. 

9. The method of claim 1, wherein the approximation func- 
tion is a basis function, wherein the basis function is gener- 
ated using a mollifier, circle or bump function. 

10. The method of claim 1 further including a step of 
testing the model on additional data indicative ofthe relation- 
ship, wherein the additional data includes at least one ofa first 
member related to a second member according to the rela- 
tionship, and wherein at least one of: the first member is not a 
member of the first data collection, and the second member is 
not a member of the second data collection. 

11. The method of claim 1, wherein the physical event 
includes one of: an airflow around a physical object, a water 
flow over an object, a radio signal, and recognition of an 
object, maneuvering of an object. 

12. The method of claim 1, wherein a confidence level for 
a plurality of functions of the residuals is determined at least 
one of simultaneously and sequentially. 

13. The method of claim 1, wherein the data collections are 
high-dimensional data collections. 

14. The method of claim 1, wherein the data collections are 
spatial and without time parameterization. 

15. The method of claim 1, wherein the approximation 
function is selected from the group consisting of radial basis 
functions and modulated asymmetric radial basis functions. 
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16. The method of claim 1, wherein at least one of the first 
data collection and second data collection is substantially 
noise free. 

17. The method of claim 1, wherein the approximation 
function is a non-symmetrical radial basis function with at 
least one of compact support and non-compact support. 

18. A method for failure prediction using a modeling rela- 
tionship between first and second data collections, wherein 
for each member of the first data collection, there is a corre- 
sponding member of the second collection, comprising: 

determining residuals between the first and second collec- 

tions; 

determining, using the residuals, a position of one of the 

members of the first data collection; 
determining proximity data relative to a value (V) for the 
one member of the first data collection at the position; 

determining a subcollection ofthe first collection, wherein 
each member of the subcollection has a value that is in 
proximity to the value V according to the proximity data; 

generating a basis function by a computer from the subc- 
ollection for obtaining a model of the relationship; and 

outputting model information for presentation to a user or 
a predetermined process for affecting or identifying a 
physical event, wherein the model information includes 
at least one of: (a) data indicative of a correspondence 
between the model and the relationship, (b) data indica- 
tive of a variance between the model and the relation- 
ship, (c) an extrapolation ofthe relationship, (d) an inter- 
polation of the relationship, (e) multi-dimensional 
model output, and (f) notification of the physical event 
and wherein an approximation function is modulated, or 
skewed, by a shape function; and facilitating failure 
prediction by using the modeling relationship between 
the first and second data collections. 

19. A method for image processing using a modeling rela- 
tionship between first and second data collections, wherein 
for each member of the first data collection, there is a corre- 
sponding member of the second collection, comprising: 

determining residuals between the first and second collec- 

tions; 

determining, using the residuals, a position of one of the 

members of the first data collection; 
determining proximity data relative to a value (V) for the 
one member of the first data collection at the position; 

determining a subcollection ofthe first collection, wherein 
each member of the subcollection has a value that is in 
proximity to the value V according to the proximity data; 

generating a basis function by a computer from the subc- 
ollection for obtaining a model of the relationship; and 

outputting model information for presentation to a user or 
a predetermined process for affecting or identifying a 
physical event, wherein the model information includes 
at least one of: (a) data indicative of a correspondence 
between the model and the relationship, (b) data indica- 
tive of a variance between the model and the relation- 
ship, (c) an extrapolation ofthe relationship, (d) an inter- 
polation of the relationship, (e) multi-dimensional 
model output, and (f) notification of the physical event 
and wherein an approximation function is modulated, or 
skewed, by a shape function; and facilitating image pro- 
cessing by using the modeling relationship between the 
first and second data collections. 

20. A method for financial time series analysis using a 
modeling relationship between first and second data collec- 
tions, wherein for each member of the first data collection, 
there is a corresponding member of the second collection, 
comprising: 
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determining residuals between the first and second collec- 
tions; 

determining, using the residuals, a position of one of the 
members of the first data collection; 

determining proximity data relative to a value (V) for the 
one member of the first data collection at the position; 

determining a subcollection of the first collection, wherein 
each member of the subcollection has a value that is in 
proximity to the value V according to the proximity data; 

generating a basis function by a computer from the subc- 
ollection for obtaining a model of the relationship; and 

outputting model information for presentation to a user or 
a predetermined process for affecting or identifying a 
physical event, wherein the model information includes 
at least one of: (a) data indicative of a correspondence 
between the model and the relationship, (b) data indica- 
tive of a variance between the model and the relation- 
ship, (c) an extrapolation ofthe relationship, (d) an inter- 
polation of the relationship, (e) multi-dimensional 
model output, and (f) notification of the physical event 
and wherein an approximation function is modulated, or 
skewed, by a shape function; and facilitating financial 
time series analysis by using the modeling relationship 
between the first and second data collections. 
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